Creating an Amazon EMR Cluster
In this tutorial, we will see how you can create a new EMR cluster
1. Login to AWS and go to EMR console
Login your AWS account
Sign in to the AWS Management Console and open the Amazon EMR console at
https://console.aws.amazon.com/elasticmapreduce/
2.Click Create Cluster
The Create Cluster page is divided into below section:
· General Configuration
· Software configuration
· Hardware configuration
· Security and access
General Configuration
In this Configuration section we will configure following setup.
ClusterName:Enter a descriptive name for your cluster.
Logging:This determines whether Amazon EMR captures detailed log data to Amazon S3.
Launch Mode:Cluster
Software Configuration
In the Software Configuration section, we will configure following setup.
Vendor:Amazon
Release:Choose the EMR version as emr-4.5.0
Applications: Choose Spark
Spark: Spark 1.6.1 on Hadoop 2.7.2 YARN with Ganglia 3.7.2
Hardware Configuration
In the Hardware Configuration section, we will configure following setup.
Instance type:Select your instance type
Number of instances:Select the number of instance for EMR cluster where 1 instance dedicated to master server and remaining instance for core(nodes).
Note:The minimum number of nodes is 3.
Security and Access
EC2 key pair:Choose the key pair for your emr instance access
Permissions : Default :You can choose the default IAM roles.
Review your configuration and if you are satisfied with the settings, click Create Cluster.
Add HDFS Permission for tomcat:
After the creating cluster add steps execution on created cluster
Goto the created EMR cluster and click Add Steps
Pass following paramters and click add
Step type : Custom Jar
Name : ideata app
JAR location : s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
Arguments : s3://bda-ideata/tomcat_hdfs_permission.sh
Action on failure : Continue
Verify the step you added has been successfully completed by clicking on steps drop down and checking the status which should be “completed”